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Abstract 

In the classical non-adaptive group testing setup, pools of items are tested together, and the main goal 
of a recovery algorithm is to identify the complete defective set given the outcomes of different group 
tests. In contrast, the main goal of a non-defective subset recovery algorithm is to identify a subset of 
non-defective items given the test outcomes. In this paper, we present a suite of computationally efficient 
and analytically tractable non-defective subset recovery algorithms. By analyzing the probability of error 
of the algorithms, we obtain bounds on the number of tests required for non-defective subset recovery 
with arbitrarily small probability of error. Our analysis accounts for the impact of both the additive noise 
(false positives) and dilution noise (false negatives). By comparing with the information theoretic lower 
bounds, we show that the upper bounds on the number of tests are order-wise tight up to a log 2 K 
factor, where K is the number of defective items. We also provide simulation results that compare the 
relative performance of the different algorithms and provide further insights into their practical utility. 

The proposed algorithms significantly outperform the straightforward approaches of testing items one- 
by-one, and of first identifying the defective set and then choosing the non-defective items from the 
complement set, in terms of the number of measurements required to ensure a given success rate. 

Index Terms 

Non-adaptive group testing, boolean compressed sensing, non-defective subset recovery, inactive 
subset identification, linear program analysis, combinatorial matching pursuit, sparse signal models. 
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I. Introduction 

The general group testing framework [T|, El considers a large set of N items, in which an unknown 
subset of K items possess a certain testable property, e.g., the presence of an antigen in a blood sample, 
presence of a pollutant in an air sample, etc. This subset is referred to as the “defective” subset, and its 
complement is referred to as the “non-defective” or “healthy” subset. A defining notion of this framework 
is the group test, a test that operates on a group of items and provides a binary indication as to whether or 
not the property of interest is present collectively in the group. A negative indication implies that none of 
the tested items are defective. A positive indication implies that at least one of the items is defective. In 
practice, due to the hardware and test procedure limitations, the group tests are not completely reliable. 
Using the outcomes of multiple such (noisy) group tests, a basic goal of group testing is to reliably 
identify the defective set of items with as few tests as possible. The framework of group testing has 
found applications in diverse engineering fields such as industrial testing (3), DNA sequencing El, EL 
data pattern mining @-17], medical screening @, multi-access communications |j2], |jS], data streaming 
I®, OH, etc. 

One of the popular versions of the above theme is the non-adaptive group testing (NGT), where different 
tests are conducted simultaneously, i.e., the tests do not use information provided by the outcome of 
any other test. NGT is especially useful when the individual tests are time consuming, and hence the 
testing time associated with adaptive, sequential testing is prohibitive. An important aspect of NGT is 
how to determine the set of individuals that go into each group test. Two main approaches exist: a 
combinatorial approach, see e.g., EH-lEl, which considers explicit constructions of test matrices/pools; 
and a probabilistic pooling approach, see e.g., ifTOl . lfl4l . lfl5l . where the items included in the group test 
are chosen uniformly at random from the population. Non-adaptive group testing has also been referred 
to as boolean compressed sensing in the recent literature lfl6l . fTTl . 

In this work, in contrast to the defective set identification problem, we study the healthy/non-defective 
subset identification problem in the noisy, non-adaptive group testing with random pooling (NNGT-R) 
framework. There are many applications where the goal is to identify only a small subset of non-defective 
items. For example, consider the spectrum hole search problem in a cognitive radio (CR) network setup. 
It is known that the primary user occupancy is sparse in the frequency domain, over a wide band of 
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interest m, cod. This is equivalent to having a small subset of defective items embedded in a large set 
of candidate frequency bins. The secondary users do not need to identify all the frequency bins occupied 
by the primary users; they only need to discover a small number of unoccupied sub-bands to setup the 
secondary communications. This, in turn, is a non-defective subset identification problem when the bins 
to be tested for primary occupancy can be pooled together into group tests EOl . In EH . using information 
theoretic arguments, it was shown that compared to the conventional approach of identifying the non¬ 
defective subset by first identifying the defective set, directly searching for an L-sized non-defective 
subset offers a reduction in the number of tests, especially when L is small compared to N — K. The 
achievability results in EHl were obtained by analyzing the performance of the exhaustive search based 
algorithms which are not practically implementable. In this paper, we develop computationally efficient 
algorithms for non-defective subset identification in an NNGT-R framework. 

We note that the problem of non-defective subset identification is a generalization of the defective 
set identification problem, in the sense that, when L = N — K, the non-defective subset identification 
problem is identical to that of identifying the K defective items. Hence, by setting L = N — K, the 
algorithms presented in this work can be related to algorithms for finding the defective set; see El for 
an excellent collection of existing results and references. In general, for the NNGT-R framework, three 
broad approaches have been adopted for defective set recovery ifTTl . First, the row based approach (also 
frequently referred to as the “naive” decoding algorithm) finds the defective set by finding all the non¬ 
defective items. The survey in EH lists many variants of this algorithm for finding defective items. More 
recently, the CoCo algorithm was studied in IfTTl . where an interesting connection of the naive decoding 
algorithm with the classical coupon-collector problem was established for the noiseless case. The second 
popular decoding approach is based on the idea of finding defective items iteratively (or greedily) by 
matching the column of the test matrix corresponding to a given item with the test outcome vector EL 
mu, m, Ea. For example, in EH . column matching consists of taking set differences between the 
set of pools where the item is tested and the set of pools with positive outcomes. Another variant of 
matching is considered in IfTTl . where, for a given column, the ratio of number of times an item is tested 
in pools with positive and negative outcomes is computed and compared to a threshold. A recent work, 
Ea, investigates the problem of finding zeros in a sparse vector in the compressive sensing framework, 
and also proposes a greedy algorithm based on correlating the columns of the sensing matrix (i.e., column 
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matching) with the output vector^ The connection between defective set identification in group testing 
and the sparse recovery in compressive sensing was further highlighted in Q, ifTT! . ll26l . where relaxation 
based linear programming algorithms have been proposed for defective set identification in group testing. 
A class of linear programs to solve the defective set identification problem was proposed by letting the 
boolean variables take real values (between 0 and f) and setting up inequality or equality constraints to 
model the outcome of each pool. 

In this work, we develop novel algorithms for identifying a non-defective subset in an NNGT-R 
framework. We present error rate analysis for each algorithm and derive non-asymptotic upper bounds 
on the average error rate. The derivation leads to a theoretical guarantee on the sample complexity, i.e., 
the number of tests required to identify a subset of non-defective items with arbitrarily small probability 
of error. We summarize our main contributions as follows: 

• We propose a suite of computationally efficient and analytically tractable algorithms for identifying 
a non-defective subset of given size in a NNGT-R framework: RoAl (row based), CoAl (column 
based) and RoLpAl, RoLpAl++, CoLpAl (Linear Program (LP) relaxation based) algorithms. 

• We derive bounds on the number of tests that guarantee successful non-defective subset recovery for 
each algorithm. The derived bounds are a function of the system parameters, namely, the number 
of defective items, the size of non-defective subset, the population size, and the noise parameters. 
Further, 

- The presented bounds on the number of tests for different algorithms are within 0(log 2 K) 
factor, where K is the number of defective items, of the information theoretic lower bounds 
which were derived in our past work li2Tl . 

- For our suite of LP based algorithms, we present a novel analysis technique based on charac¬ 
terizing the recovery conditions via the dual variables associated with the LP, which may be of 
interest in its own right. 

• Finally, we present numerical simulations to compare the relative performance of the algorithms. 

'Note that directly computing correlations between column vector for an item and the test outcome vector will not work in 
case of group testing, as both the vectors are boolean. Furthermore, positive and negative pools have asymmetric roles in the 
group testing problem. 


March 1, 2016 


DRAFT 


5 


The results also illustrate the significant benefit in finding non-defective items directly, compared 
to using the existing defective set recovery methods or testing items one-by-one, in terms of the 
number of group tests required. 

The rest of the paper is organized as follows. Section [TT] describes the NNGT-R framework and the 
problem setup. The proposed algorithms and the main analytical results are presented in Section UTTI 
The proofs of the main results are provided in Section [V] Section [VI] discusses the numerical simulation 
results, and the conclusions are presented in Section IVIII We conclude this section by presenting the 
notation followed throughout the paper. 

Notation: Matrices are denoted using uppercase bold letters and vectors are denoted using an underline. 
For a given matrix A, a[’ and a, denote the i th row and column, respectively. For a given index set 
S, A(S',:) denotes a sub-matrix of A where only the rows indexed by set S are considered. Similarly, 
A(:,S) or Ag denotes a sub-matrix of A that consists only of columns indexed by set S. For a vector 
a, a(i ) denotes its i th component; supp(a) = {j : a(j ) > 0 }; {a = c} denotes the set {j : a(j) = c} 
for any c. In the context of a boolean vector, a c denotes the component wise boolean complement of a. 
l n and 0 n denote an all-one and all-zero vector, respectively, of size n x 1. We denote the component 
wise inequality as a ^ 6, i.e., it means a(i) < b(i ) V i. Also, aob denotes the component-wise product, 
i.e., (aot)(i) = a(i)b(i), V i. The boolean OR operation is denoted by “V”. For any q e [ 0 , 1 ], B(q) 
denotes the Bernoulli distribution with parameter q. I 4 denotes the indicator function and returns 1 if 
the event A is true, else returns 0. Note that, x(n) = 0(y(n)) implies that 3 B > 0 and no > 0, such 
that \x(n)\ < B\y(n)\ for all n > no- Further, x(n) = Q(y(n)) implies that 3 B > 0 and no > 0, such 
that |x(n)| > B\y(ri)\ for all n > no- Also, x{n) = o(y(n)) implies that for every e > 0, there exists an 
no > 0 such that |x(n)| < e|y(n)| for all n > no- All logarithms in this papers are to the base e. Also, 
for any p £ [ 0 , 1 ], H^p) denotes the binary entropy in nats, i.e., H^{p) = —p\og{p) — (1 — p) log(l —p). 

II. Signal Model 

In our setup, we have a population of N items, out of which K are defective. Let Q C [N] denote 
the defective set, such that \Q\ = K. We consider a non-adaptive group testing framework with random 
pooling d, lfT6l . l ITTI l. lf27ll . where the items to be pooled in a given test are chosen at random from the 
population. The group tests are defined by a boolean matrix, X £ {0, l} MxJV , that assigns different items 
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to the M group tests (pools). The j th pool tests the items corresponding to the columns with 1 in the j th 
row of X. We consider an i.i.d. random Bernoulli measurement matrix fl6l . where each X V] ~ B(p) for 
some 0 < p < 1. Thus, M randomly generated pools are specified. In the above, p is a design parameter 
that controls the average group size, i.e., the average number of items being tested in a single group test. 
In particular, we choose p = and a specific value of a is chosen based on the analysis of different 
algorithms. 

If the tests are completely reliable, then the output of the M tests is given by the boolean OR of the 
columns of X corresponding to the defective set Q. However, in practice, the outcome of a group test 
may be unreliable. Two popular noise models that are considered in the literature on group testing are 
m, ED, El: (a) An additive noise model, where there is a probability, q £ (0,0.5), that the outcome 
of a group test containing only non-defective items turns out to be positive (Fig. [TJ; (b) A dilution model, 
where there is a probability, u £ (0, 0.5), that a given item does not participate in a given group test (see 
Fig. [[])• Let dj £ {0,1} M . Let dj(j) ~ £>( 1 — u) be chosen independently for all j = 1,2 ,M and 

for all/ = 1,2,..., N. Let D* = diag(dj). The output vector y £ {0,1} M can be represented as 

N 

\J \J w, (1) 

1=1 

where £ {0,1}' / is the z th column of X, w £ {0, I} A/ is the additive noise with the z th component 
w(i ) ~ B(q). Note that, for the noiseless case, u = 0, q = 0. Given the test output vector, y, our goals 
are as follows: 

(a) To find computationally tractable algorithms to identify L non-defective items, i.e., an /.-sized subset 
belonging to [iV]\C/. 

(b) To analyze the performance of the proposed algorithms with the objective of (i) finding the number of 
tests and (ii) choosing the appropriate design parameters that leads to non-defective subset recovery 
with high probability of success. 

In the literature on defective set recovery in group testing or on sparse vector recovery in compressed 
sensing, there exist two type of recovery results: (a) Non-uniform/Per-Instance recovery results : These 
state that a randomly chosen test matrix leads to non-defective subset recovery with high probability of 
success for a given fixed defective set and, (b) Uniform/Universal recovery results'. These state that a 
random draw of the test matrix leads to a successful non-defective subset recovery with high probability 
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Fig. 1. Impact of different types of noise on the group testing signal model. 

for all possible defective sets. It is possible to easily extend non-uniform results to the uniform case using 
union bounds. Hence, we focus mainly on non-uniform recovery results, and demonstrate the extension 
to the uniform case for one of the proposed algorithms (see Corollary [[])• Note that the non-uniform 
scenario is equivalent to the uniform recovery scenario when the defective set is chosen uniformly at 
random from the set of (^) possible choices. For the latter scenario, information theoretic lower bounds 
on the number of tests for the non-defective subset recovery problem were derived in ETI using Fano’s 
inequality. We use these bounds in assessing the performance of the proposed algorithms (see SectionllVl). 
For the ease of reference, we summarize these results in Table [fl 

For later use, we summarize some key facts pertaining to the above signal model in the lemma below. 
For any l e [M] and k € [N], let Xy~ denote the (I,. kf h entry of the test matrix X and let Y/ = y(l ) 
denote the Z th test output. With u, q and p as defined above, let F = (1 — g) (1 — (1 — u)p) R and 
7 o = (!_(] u - u ) p ) ■ Then ^ follows that, 

Lemma 1. (a) P(Y/ = 0) = T. 

(b) For any j <£ S d , P(Y^) = Pp'l)- 

(c) For any i € S d , P(YJ = 01 Xjj = 1) = 7 0 r and P(Fj = 01 X^ = 0) = Further, using Bayes 

rule, P {Xu = l\Yi = 0) = p 7 0 . 

(d) Given Yi, Xu is independent of Xij for any i 7 .S',/ and j f S d . 

The proof is provided in Appendix [Aj 

III. Algorithms and Main Results 

We now present several algorithms for non-defective/healthy subset recovery. Each algorithm takes the 
observed noisy test-output vector y € {0,1} A/ and the test matrix X e {0, l} MxN as inputs, and outputs 
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a set of L items, Sl, that have been declared non-defective. The recovery is successful if the declared 
set does not contain any defective item, i.e., Sl n S,i = {0}. For each algorithm, we derive expressions 
for the upper bounds on the average probability of error, which are further used in deriving the number 
of tests required for successful non-defective subset recovery. 

A. Row Based Algorithm 

Our first algorithm to find non-defective items is also the simplest and the most intuitive one. We make 
use of the basic fact of group testing that, in the noiseless case, if the test outcome is negative, then all 
the items being tested are non-defective. 

RoAl (Row based algorithm): 

• Compute z = ^jgsupp(y<=) —P' where xp is the j th row of the test matrix. 

• Order entries of z in descending order. 

• Declare the items indexed by the top L entries as the non-defective subset. 

That is, declare the L items that have been tested most number of times in pools with negative 
outcomes as non-defective items. The above decoding algorithm proceeds by only considering the tests 
with negative outcomes. Note that, when the test outcomes are noisy, there is a nonzero probability of 
declaring a defective item as non-defective. In particular, the dilution noise can lead to a test containing 
defective items in the pool being declared negative, leading to a possible misclassification of the defective 
items. On the other hand, since the algorithm only considers tests with negative outcomes, additive noise 
does not lead to misclassification of defective items as non-defective. However, the additive noise does 
lead to an increased number of tests as the algorithm has to possibly discard many of the pools that 
contain only non-defective items. 

We note that existing row based algorithms for finding defective set 0, ifTTl can be obtained as a 
special case of the above algorithm by setting L = N — K, i.e., by looking for all non-defective items. 
However, the analysis in the past work does not quantify the impact of the parameter L and that is our 
main goal here. We characterize the number of tests, M, that are required to find L non-defective items 
with high probability of success using RoAl in Theorem [Q 
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B. Column Based Algorithm 


The column based algorithm is based on matching the columns of the test matrix with the test outcome 
vector. A non-defective item does not impact the output and hence the corresponding column in the test 
matrix should be “uncorrelated” with the output. On the other hand, “most” of the pools that test a 
defective item should test positive. This forms the basis of distinguishing a defective item from a non¬ 
defective one. The specific algorithm is as follows: 

CoAl (Column based algorithm): Let ip c b > 0 be any constant. 

• For each i = 1,..., N, compute 

T(i) = xfy c - ipcb(xfy), (2) 

where x t is the 1 th column of X. 

• Sort T(i) in descending order. 

• Declare the items indexed by the top L entries as the non-defective subset. 


We note that, in contrast to the row based algorithm, CoAl works with pools of both the negative 
and positive test outcomes (when the parameter 'ip c b > 0; its choice is explained below). For both RoAI 
and CoAl, by analyzing the probability of error, we can derive the sufficient number of tests required to 
achieve arbitrarily small error rates. We summarize the main result in the following theorem: 

Theorem 1. (Non-Uniform recovery with RoAI and CoAl ) Let T = (1 — q) (1 — (1 — u)p) K and 70 = 
(i_(i - u )p) - Suppose K > 1 and let p be chosen as jp with a = Tor RoAI, let ip 0 — 0. For CoAl, 

choose Co = ^2°* p ond set 0 (: i> = ip o- Let Co > 0 be any constant. Then, there exist absolute constants 
C a i j C a 2 > 0 independent of N, L and K, and different for each algorithm, such that, if the number of 
tests is chosen as 


M > (l + c 0 )- 


K( 1 - u) 


Cal log 


K{ N l- f) 


+ C a2 log K 


(3) 


(1 - q){ 1 - 7 o ) 2 (1 + iP 0 ) UN-K)-(L-l) 
then, for a given defective set, the algorithms RoAI and CoAl find L non-defective items with probability 

exceeding 1 — exp Co log — exp (—cq log K). 

The following corollary extends Theorem [j] to uniform recovery of a non-defective subset using RoAI 
and CoAl. 

Corollary 1. (Uniform recovery with RoAI and CoAl ) For any positive constant cq > 0, there exist 
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absolute constants C a 1 , C a 2 > 0 independent of N, L and K, and different for each algorithm, such that 
if the number of tests is chosen as 

K(l-u) 


M > (1 + c 0 ) 




(1 -q)( 1 - 7 o ) 2 (1 + f 0 ) (N-K)-(L-l) 


+ C a 2 log N 


(4) 


then for any defective set, the algorithms RoAl and CoAl find L non-defective items with probability 
exceeding 1 — exp Co log — exp(—Co log N). 

The proof of the above theorem is presented in Section IV-AI It is tempting to compare the performance 
of RoAl and CoAl by comparing the required number of tests as presented in ©. However, such 
comparisons must be done keeping in mind that the required number of observations in © are based 
on an upper bound on the average probability of error. The main objective of these results is to provide 
a guarantee on the number of tests required for non-defective subset recovery and highlight the order- 
wise dependence of the number of tests on the system parameters. For the comparison of the relative 
performance of the algorithms, we refer the reader to Section |VT1 where we present numerical results 
obtained from simulations. From the simulations, we observe that CoAl performs better than RoAl for 
most scenarios of interest. This is because, in contrast to RoAl, CoAl uses the information obtained from 
pools corresponding to both negative and positive test outcomes. 


C. Linear program relaxation based algorithms 

In this section, we consider linear program (LP) relaxations to the non-defective subset recovery 
problem and identify the conditions under which such LP relaxations lead to recovery of a non-defective 
subset with high probability of success. These algorithms are inspired by analogous algorithms studied 
in the context of defective set recovery in the literature ifTTl . ll26ll . However, past analysis on the number 
of tests for the defective set recovery do not carry over to the non-defective subset recovery because the 
goals of the algorithms are very different. Let Y z = {l € [. M] : y(7) = 0}, i.e., Y z is the index set of all 
the pools whose test outcomes are negative and M z = \Y Z \. Similarly, let Y p = {l e [M] : y (l) = 1} and 
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optimization variables z € 1R A and // <G E /Uz : 

( 5 ) 

:)(1jv i z = 0 Mt , (6) 

Ojv ^ z_ =4 Xn > V z ^ OjUj > 

JiA/"" E ri. 


RoLpAl (LP relaxation with negative outcome pools only) 

• Setup and solve LPO. Let z be the solution of LPO. 

• Sort z_ in descending order. 

• Declare the items indexed by the top L entries as the non-defective subset. 

The above program relaxes the combinatorial problem of choosing L out of N items by allowing the 
boolean variables to acquire “real” values between 0 and 1 as long as the constraints imposed by negative 
pools, specified in ©, are met. Intuitively, the variable z (or the variable \1 N — z\) can be thought of 
as the confidence with which an item is being declared as non-defective (or defective). The constraint 
ljy z < L forces the program to assign high values (close to 1) for “approximately” the top L entries 
only, which are then declared as non-defective. 

For the purpose of analysis, we first derive sufficient conditions for correct non-defective subset recovery 
with RoLpAl in terms of the dual variables of LPO. We then derive the number of tests required to satisfy 
these sufficiency conditions with high probability. The following theorem summarizes the performance 
of the above algorithm: 

Theorem 2. (Non-Uniform recovery with RoLpAl) Let K > 1 and let p be chosen as fr with a = n ' ^ . 
If the number of tests is chosen as in (0 with Lq = 0, then for a given defective set there exist absolute 
constants C a l, C,ri > 0 independent of N, L and K, such that RoLpAl finds L non-defective items with 
probability exceeding 1 — exp Co log (^K J^))) ~ ex P ( —c 0 log K). 

2 The other algorithms presented in this sub-section, namely, RoLpAl++ and CoLpAl, have the same structure and differ only 
in the linear program being solved. 


M p = \Y p \. Define the following linear program, with 


minimize 

Z _, L ] 


(LPO) 


subject to 


Consider the following algorithm: 


0 
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The proof of the above theorem is presented in Section IV-BI Note that LPO operates only on the 
set of pools with negative outcomes and is, thus, sensitive to the dilution noise which can lead to a 
misclassification of a defective item as non-defective. To combat this, we can leverage the information 
available from the pools with positive outcomes also, by incorporating constraints for variables involved 
in these tests. Consider the following linear program with optimization variables z £ M' V and 17 £ 


(LP1) 


minimize 

1m z V z 

(7) 

Z_,V z 



subject to 

X(Y Zl :)(1 N — z) — = 0 M ^ 



X(Y p , :)(1 N — z) !>= (1 — eo)l Mp 

(8) 


Ojv Y Y ijvi V_ z Y Q.m z 



.IatJ: < L. 



In the above, 0 < eo < 1 is a small positive constant. Note that © attempts to model, in terms of 
real variables, a boolean statement that at least one of the items tested in tests with positive outcomes 
is a defective item. We refer to the algorithm based on LP1 as RoLpAl++. We expect RoLpAl++ 
to outperform RoLpAl, as the constraint © can provide further differentiation between items that are 
indistinguishable just on the basis of negative pools. Note that, due to the constraint 1 J^z < L, the entries 
of z in [A r ]\,Sy are generally assigned small values. Hence, when L is small, for many of the positive 
pools, the constraint © may not be active. Thus, we expect RoLpAl++ to perform better than RoLpAl as 
the value of L increases; this will be confirmed via simulation results in Section [Vi] Due to the difficulty 
in obtaining estimates for the dual variables associated with the constraints ©, it is difficult to derive 
theoretical guarantees for RoLpAl++. However, we expect the guarantees for RoLpAl++ to be similar 
to RoLpAl, and we refer the reader to Appendix |F] for a discussion regarding the same. 

Motivated by the connection between RoAl and RoLpAl, as revealed in the proof of Theorem |2] (see 
Section lV-BI) . we now propose another LP based non-defective subset recovery algorithm that incorporates 
both positive and negative pools, which, in contrast to RoLpAl++, turns out to be analytically tractable. 
By incorporating © in an unconstrained form and by using the same weights for all the associated 
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Lagrangian multipliers in the optimization function, we get: 


minimize 

Z 


(LP2) 


1m*X(Y 2 , :)(1jv ~ z) — V’ ip 1 m p ^-(Y Pi :)(1 jv — z) 
subject to Oat ^ 1 ^ Iat, 


(9) 


fjyti < L, 

where ipi p > 0 is a positive constant that provides appropriate weights to the two different type of 
cumulative errors. Note that, compared to LP1, we have also eliminated the equality constraints in the 
above program. The basic intuition is that by using © in an unconstrained form, i.e., by maximizing 
Yl jeY X(j,:)(Iat — z), the program will tend to assign higher values to (1 — z(i)) (and hence lower 
values to z_{i)) for i € ,5’,/ since for random test matrices with i.i.d. entries, the defective items are likely 
to be tested more number of times in the pools with positive outcomes. Also, in contrast to LP1 where 
different weightage is given to each positive pool via the value of the associated dual variable, LP2 gives 
the same weightage to each positive pool, but it adjusts the overall weightage of positive pools using the 
constant ipi p . We refer to the algorithm based on LP2 as CoLpAl. The theoretical analysis for CoLpAl 
follows on similar lines as RoLpAl and we summarize the main result in the following theorem: 
Theorem 3. (Non-Uniform recovery with CoLpAl) Let T = (1 —q) (1 — (1 — u)p) K and 70 = ^_^_ u ^ p y 
Let K > 1 and let p be chosen as with a = pj-~y- Let ipQ = min ^ , 2 (i-r) ) am ^ set ^ l P = ^o- 

Then, for any positive constant cq , there exist absolute constants C„ 1 , C a 2 > 0 independent of N, L and I\, 
such that, if the number of tests is chosen as in (© with An = 0, then for a given defective set CoLpAl finds 
L non-defective items with probability exceeding 1 — 2 exp f—colog — exp (—colog K). 

An outline of the proof of the above theorem is presented in Section IV-CI 


IV. Discussion on the Theoretical Guarantees 

We now present some interesting insights by analyzing the number of tests required for correct non¬ 
defective subset identification by the proposed recovery algorithms. We note that the expression in © 
adapted for different algorithms differs only on account of the constants involved. This allows us to 
present a unified analysis for all the algorithms. 

(a) Asymptotic analysis of M as IV —y 00: We consider the parameter regimes where K,L —y 00 
as N —> 00. We note that, under these regimes, when the conditions specified in the theorems 
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are satisfied, the probability of decoding error can be made arbitrarily close to zero. In particular, 

ao, as N —>■ oo, where 0 < /3q < «o < L 


we consider the regime where // 


An L 


N 


, and C ~^ Co — as N —> oo. Also, note that 70 


«o + A) < 1- Define C = 77 ^ 
as N —> 00 . Using Stirling’s formula, it can be shown that lim 


u 


("If) 


„ < Hb ^ isee 

N~> 00 (JV-if)-(L-l) — 1-Co 1 ’ 


ED), where #/,(') is the binary entropy function. Further, let //(C) = Now, since //(Co) is 

a constant, the sufficient number of tests M for the proposed algorithms depends on K as M > 
C ° {i-u)\i-q) ( C aig{(o) + C a2 logK + 0 ( 1 )). Here, C 0 ,C a i and C a 2 are constants independent of 
N, A, L , u and q. 


We compare the above with the sufficient number of test required for the defective set recovery 
algorithms. When K grows sub-linearly with N (i.e., /3q = 0), the sufficient number of tests for 
the proposed decoding algorithms is 0(K log K), which is better than the sufficient number of tests 
for finding the defective set, which scales as 0(K log N) fI71 . 1231 . Whereas, for the regime where 
K grows linearly with N (i.e., So > 0), the performance of the proposed algorithms is order-wise 
equivalent to defective set recovery algorithms. 

We also compare the uniform recovery results. The sufficient number of tests for uniform recovery 
as given in Corollary Q] for the algorithm RoAl and CoAl is M = 0(K log A r ), which is significantly 
better than the defective set recovery algorithms, where the sufficient number of tests scale as 

o{k 2 iog(f)) m. 

(b) Variation of M with L: Let £ and g(() be as defined above. We note that the parameter L impacts M 
only via the function //(C)- Lemma [2] in Appendix |E] shows that for small values (or even moderately 
high values) of C, //(C) i s upper bounded by an affine function in C- This, in turn, shows that the 
sufficient number of tests is also approximately affine in L; this is also confirmed via simulation 
results in Section IVTl 

(c) Comparison with the information theoretic lower bounds: We compare with the lower bounds on the 
number of tests for non-defective subset recovery, as tabulated in Tabled] For the noiseless case, i.e., 
u = 0,(/ = 0, the sufficient number of tests are within Oflog 2 K) factor of the lower bound. For 
the additive noise only case, the proposed algorithms incur a factor of 1/(1 — q) increase in M. In 
contrast, the lower bounds indicate that the number of tests is insensitive to additive noise, when q 
is close to 0 (in particular, when q < 1/A'). For the dilution noise case, the algorithms incur a factor 
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^ ^ increase in M, which is the same as in the lower bound. We have also compared the number 
of tests obtained via simulations with an exact computation of the lower bounds, and, interestingly, 
the algorithms fall within O (log K) factor of the lower bounds; we refer the reader to Figure [4] 
Section [VlJ 

(d) Defective set recovery via non-defective subset recovery: It is interesting to note that by substituting 
L = N — K in |3]), we get M = O ()» which is order-wise the same as the number of 
tests required for defective set identification derived in the existing literature iflTTl . l23l . li28l . 

(e) Robustness under uncertainty in the knowledge of K: The theoretical guarantees presented in the 

above theorems hold provided the design parameter p is chosen as O ( ^ ). This requires the 

knowledge of u and K. Note that the implementation of the recovery algorithms do not require us 
to know the values of K or u. These system model parameters are only required to choose the value 
of p for constructing the test matrix. If u and K are unknown, similar guarantees can be derived, 
with a penalty on the number of tests. For example, choosing p as 0(1/K), i.e., independent of u, 
results in a times increase in the number of tests. The impact of using an imperfect value of 
K can also be quantified. Let K be the value used to design the test matrix and let A^ > 0 be 
such that K = A/. A'. That is, A/, parametrizes the estimation error in K. Using the fact that for 
large n, (1 — a/n) n ~ exp(—a), it follows that with p = O(-^j^), the number of tests increases 
approximately by a factor of /m(A*;) = A/,. exp (1 — u) — 1^ j compared to the case with 
perfect knowledge of K, i.e., with p = 0(1/K). It follows that the proposed algorithms are robust 
to the uncertainty in the knowledge of K. For example, with u = 0, /m( 1.5) = 1.09, i.e., a 50% 
error in the estimation of K leads to only a 9% increase in the number of tests. Furthermore, the 
asymmetric nature of /M(Afc) (e.g., /m( 1.5) = 1.09 and /m( 0-5) = 1.3) suggests that the algorithms 
are more robust when A/,. > 1 as compared to the case when A& < 1. We corroborate this behavior 
via numerical simulations also (see Table Hfl). 

(f) Operational complexity: The execution of RoAl and CoAl requires 0(MN) operations, where M is 
the number of tests. The complexity of the LP based algorithms RoLpAl, RoLpAl++ and CoLpAl 
are implementation dependent, but are, in general, much higher than RoAl and CoAl. For example, an 
interior-point method based implementation will require 0(N 2 (M+N ) 3//2 ) operations l29l . Although 
this is higher than that of RoAl and CoAl, it is still attractive in comparison to the brute force search 
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TABLE I 

Finding a subset of L non-defective items: Order results for necessary number of group tests which 
HOLD ASYMPTOTICALLY AS N -» OO , £ — ► /3o, ^ —» Qo AND «o + /3o < 1 (SEE THEOREM 3, OH). 


No Noise (u = 0, q = 0) 


( K i og t-A ) 

logK 1 o 1 o: 0 /3 0 ) 

Dilution Noise (m > 0, q — 0) 


( K Jog ) 

(1 — Lt) log K o 1 — OCQ— /3 0 J 

Additive Noise (w = 0, q > 0) 

n 

{ K Iok 1_/3 ° ^ 

min-{ log i ,log K j- b l-“0-^0 J 


based maximum likelihood methods, due to its polynomial-time complexity. 


V. Proofs of the Main Results 

We begin by defining some quantities and terminology that is common to all the proofs. In the following, 
we denote the defective set by Sd, such that ,5’j C [N] and |,S'j| = K. We denote the set of L non-defective 
items output by the decoding algorithm by Sl- For a given defective set S,i, £ — |<Sl Cl S,i f {0}| 
denotes the error event, i.e., the event that a given decoding algorithm outputs an incorrect non-defective 
subset and let Pr(£) denote its probability. Define No = (N — K ) — (L — 1). We further let S z C [A r ]\SV/ 
denote any set of non-defective items such that |5 Z | = No. Also, we let S z denote all such sets possible. 

Note that |<S-| = Finally, recall from Lemma HI (Section HU), r = (1 — q) (1 — (1 — u)p) K and 

A u 

— (1 —(1 - u ) p )' 


A. Proof of Theorem [7] and Corollary [7] 


The proof involves upper bounding the probability of non-defective subset recovery error of the 
decoding algorithms, RoAl and CoAI, and identifying the parameter regimes where they can be made 
sufficiently small. 

For CoAI, recall that we compute the metric T(i) — x[y c — (f'dfxfy for each item i and output the 
set of items with the L largest metrics as the non-defective set. Clearly, for any item i £ Sd, if i G Sl, 
then there exists a set S z of non-defective items such that for all items j £ S z , 'T(j) < T(i). Thus, for 
CoAI, it follows that, 


£ C U {i £ S L \ C U U 
i&Sd i€Sd S Z GS Z 


n {T(j) < T{i)} 


( 10 ) 


The algorithm RoAI succeeds when there exists a set of at least L non-defective items that have been 
tested more number of times than any of the defective items, in the tests with negative outcomes. The 
number of times an item i is tested in tests with negative outcomes is given by z_{i){= x[y c ), which is 
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computed by RoAl. Hence, for any item i £ S d , if i £ ,5V,, then there exists a set S z of non-defective 

items such that for all items j £ S z , z(j) < zfi). And, thus, (flQl) applies for RoAl also, except with T 

replaced with z. Also, note that zfi) = T(i) \ip cb =o- This allows us to unify the subsequent steps in the 
proof for the two algorithms. We first work with the quantity T(i) and later specialize the results for each 
algorithm. The overall intuition for the proof is as follows: For any i, since T(i) is a sum of independent 
random variables, it will tend to concentrate around its mean value. For any i £ Sd and j f. Sd, we will 
show that the mean value of T(J ) is larger than that of 'T(i). Thus, we expect the probability of the 
error event defined in (flQl ) to be small. 

For any i £ Sd and for any j Sd, define = E(T(i)), ptj = E(T(j))> — Var(T(i)) and 

<jJ = Var(T(j)). It follows that, 

Hj = Mp(T - fj cb (l - T)) and m = Mp ( 7 0 r - ip cb (l - 7 0 r)) (11) 

o'? <Mp(T + V4(l - F)) and < Mp ( 7o r + ^ b {\ - 7 o r)) . (12) 

An brief explanation of the above equations in presented in Appendix |B] We note that, (/i ? — //j) = 
MpF( 1 — 7o)(l + il’cb ) > 0- To simplify (flQl) further, we present the following proposition: 

Proposition 1. Define r = ) . Then, for any eo > 0 it follows that 


Pr(S) < K 


N -K 
L — 1 


(■ Pehf 0 + KP ed , 


(13) 


where, P eh = P ({T(j) < r + e 0 }) for any j £ S z and P ed = P {{T(i) > r}) for any i £ S d . 

The proof of the above proposition is presented in Section IV- A3 1 Note that the above definitions of 
P e h and P e d are unambiguous because the corresponding probabilities are independent of the specific 
choice of indices j and i, respectively. 

Our next task is to bound P, /, and P e d as defined in the above proposition. For any k, since 'T(k) 

is a sum of M independent random variables, each bounded by max(l, ?V c ft), we can use Bernstein’s 

to bound the probability of their deviation from their mean values. Since fich is a free 

A _ .. _ 


inequality I3B 


parameter, we proceed by assuming that 'ip c i, < 1. Thus, for any i £ S d , with Vq — r — //,, = „ 


Ped = P(T(i) > t) = P(T{i) > m + 5 0 ) < exp - 




2(7 i + l^o 


(14) 


’For ease of reference, we have stated it in Appendix iGl 


March I, 2016 


DRAFT 









18 


Similarly, for any j € S z , we choose eo = = ^ 4 ^ > anc ^ § et 

-Peh = P{T{j) < T + e 0 ) = P(T(j) < Hj - e 0 ) < exp (- 


c o 


2cr ? + i e o 


(15) 


We now proceed separately for each algorithm to arrive at the final results. Before that, we note that 

l K 


by choosing p = % with a = 


1 - 


(1 —u)a 
K 


> exp (—2a(1 — u)) = e 2 . This follows from the 


fact that for 0 < b < 1, (1 — b) < e~ b < 1 — Thus, (1 — q)e~ 1 > T > (1 — q)e~ 2 . We also note that 
7 o < 1 for any u < 0.5 and for all K > 1. 

1 ) Proof for RoAl 

For RoAl, v /75 = 0. Thus, from (fill) and (fl2l) we have, pj — Pi = MpT( 1 — 70 ), a 2 - < MpT and of < 


Mp 7 0 T. Recall, 5 0 = and eo = ^ ■ Note that, 2cr 2 + (2/3)<5o < MpT ( 270 + (1 — 7o)/3) < 

2 MpT. Similarly, 2 of + (2/3)eo < MpT (2 + (1 — 7 o)/ 6 ) < 3 MpT. Thus, from (fl4l) and (fl5l) . we have 


Ped < exp - 


MpT{ 1 - 70 )" 


and P eh < exp - 


MpT( 1 - 70 )" 


48 


(16) 


Thus, choosing p = 1 ... and noting that T > e 2 (1 — q), from (fT3l ) we get, 


P(£) < exp 


(1 ~u)K 

M(1 — 7q) 2 (1 — q)No 
C al K(l - u) 


+ 


log ( K 




N -K 
L - 1 


+ exp 


Mi 1 ~ 7 o ) 2 (1 - q) 


+ log K 


C a 2 K{l - u) 

, with the constants C a \, C a2 cho 


with C a \ = 48e 2 and C a2 = 8e 2 . Thus, if M is chosen as specified in ([3 ! 
sen as above, then the error probability is upper bounded by exp f—co log K{ N r f^ ) ^ +exp(—co log K). 
2) Proof for CoAI 

We first bound P ec i. With i/y* = where ip 0 = we have of < Mp^oTfY + ipo). Also, we note 

that ip 0 < 1. Thus, 2of + (2/3)<fo < MpT{ 1 + (2j 0 + (1 — 7o)/3) < 2MpT(l + f> 0 ). Thus, from 

(fT4l) . we get 


Ped < exp - 


MpT( 1 + V’o)(l -7o)" 


(17) 


With f>Q as above, it follows that, 2<r 2 +(2/3)eo < MpT ^2 + 2 ^2°^^ + P T'°K 1+ t/ , o) ^ < 3MpT(l+ipo), 
since 1 + ipQ = 7 ^ 77 ^- Thus, from (fl5l) . we get 


Peh < exp - 


MpT( l+ip 0 )(l - 70 )' 


48 


(18) 


The next steps follow exactly as for RoAl and if M is chosen as specified in <(3]), with the constant 
C a i and C a2 chosen as 48e 2 and 8e 2 , respectively, then the error probability remains smaller than 


exp (—c 0 log K( n l _K) ^ + exp(—c 0 log K). 
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3) Proof of Proposition [7] 

For i £ Sd, define Hi = { T(i ) < r}. The error event in ([Hit is a subset of the right hand side in the 
following equation: 


£ C U 
ieS d 


{ u n (Sijn Hi)} uu, 

O z G6 2 jEOz 


(19) 


where = {T(j) — 7~(i)} for any i £ Sd and j £ S z . In the above, we have used the fact that, for 
any two sets A and B, A C {A n B} U B. Further, using monotonicity properties, we have 

{£ij n Hi} C {T(j) < r} C \T(j) <T + e 0 }, (20) 

where eo > 0 is any constant. Consider any non-defective item j £ S z . We note that for any given y, 
T(j) can be represented as a function of only x_j, i.e., the j th column of the test matrix X. From dTJ, 
since j f Sd, the output is independent of the entries of x y Hence, for all j f Sd, and hence for all 
j £ S z , T(j)’s are independent. Using this observation, the claim in the proposition now follows from 
(fl9l ) and (1201 by accounting for the cardinalities of different sets involved in the union bounding. 

4) Proof of Corollary [7] 

For the uniform case, we use the union bound over all possible choices of the defective set. The proof 
of the corollary follows same steps as the proof of Theorem Q3 the only difference comes on account of 
the additional union bounding that has to be done to account for all possible choices of the defective 
set. Here, we briefly discuss the different multiplicative factors that have to be included because of this 
additional union bound. Let Sd denote the set of all possible defective sets. Note that \S,{\ = (^). From 
(fl9l) in the proof of Proposition [Q we note that 


£ c 


M u U Hi\, (21) 

W {S d £S d i£S d J 

Thus, for the first term in (fl3l) . an additional multiplicative factor of (^) is needed to account for all 


u u u n (£a n Hi 

S d £S d i£S d S z eS z jeSC J 


possible defective sets. For the second term, we note that 


U U Hi c U Hi. (22) 

S d eS d i&S d iejtv] 

Thus, for the second term in (fl3l) . the multiplicative factor of K in (TTU) gets replaced by a factor of N, 
and no additional combinatorial multiplicative factors are needed. The corollary now follows using the 
same steps as in the proof of RoAl and CoAl. 
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B. Proof of Theorem [2] 

Let X G {0, \} MxN denote the random test matrix, y E {0, 1} M the output of the group test, Y z = {Z E 
[M\ : y(Z) = 0} with M z = \Y Z \, and Y p = {l E [M] : y(Z) = 1} with M p = \Y p \. Let X 2 = X(T 2 ,:) 
and X p = Xf L p ,:). Note that X 2 E {0,1} x Y and X p € {0, l} MpX,v . For the ease of performance 
analysis of the LP described in ©, we work with the following equivalent program: 

minimize 1^ X 2 z (23) 

Z " z 

(LPOa) subject to 0^ =<: z 1^, 

1 T n z>{N-L). 

The above formulation has been arrived at by eliminating the equality constraints and replacing the 
optimization variable z by (j_ v — z). Hence, the non-defective subset output by (l23l) is indexed by the 
smallest L entries in the solution of (LPOa) (as opposed to largest L entries in the solution of (LP0)). We 
know that strong duality holds for a linear program and that any pair of primal and dual optimal points 
satisfy the Karush-Kuhn-Tucker (KKT) conditions (3T1 . Hence, a characterization of the primal solution 
can be obtained in terms of the dual optimal points by using the KKT conditions. Let A 1( A 2 € M /V and 
!/£l denote the dual variables associated with the inequality constraints in (LPOa). The KKT conditions 
for any pair of primal and dual optimal points corresponding to (LPOa) can be written as follows: 

1m z X z — A 1 + A 2 — izl N = 0^r (24) 

Ai ° z = 0^; A 2 o (z — 1 jy) = Ojvi u (XnZ — (X — L)) = 0; (25) 

Qat ^ ^ ^ Iatj A (X — L); A x Y 0^; A 2 Y Q.n'i v A 0; (26) 

Let (z, Ai) A 2 ) u ) be the primal, dual optimal point, i.e., a point satisfying the set of equations (I24l) - (l26l) . 
Let Sd denote the set of defective items. Further, let Sr denote the index set corresponding to the smallest 
L entries, and hence the declared set of non-defective items, in the primal solution z. We first derive a 
sufficient condition for successful non-defective subset recovery with RoLpAl. 

Proposition 2. If \ 2 {i) > 0 V i G S f ], then Sr n S,i = {0}. 

Proof: See Appendix 0 

Let £, P(£), S z and S z be as defined at the beginning of this section. The above sufficiency condition 
for successful non-defective subset recovery, in turn, leads to the following: 
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Proposition 3. The error event associated with RoLpAl satisfies: 

£ Q . U U {1m*X 2 (:, i) > j), Vj € . (27) 

*-^2 G<->z 

Proof: Define £q(i) = (A 2 (i) = 0}. We first note, from (1241) . that for any i € [A r ] 

A 2 W = 0 ==>• 1m z X 2 (:, i) = A 1 (z) + v > v. (28) 

Define 6» 0 = max {i:Ai ( j)= o} 1^X 2 (:, i) and 6>i = min^^o} Im = X,(:, i). We relate 0 O) 6»i and v as 
follows: 

Proposition 4. The dual optimal variable v satisfies 8q < v < 0 \ . 

Proof: See Appendix |D] 

From the above proposition and (l28l) it follows that 

^(*)c{4x z (:,i)>0 o }. (29) 


We note that there exists at most L items for which A, (i) > 0; otherwise the solution would violate the 
primal feasibility constraint: l r N zfi) > (N — L ). Thus, there exist at least (N — K) — (L — 1) non-defective 
items in the set {i : A 1 (i) = 0}. From (l29l ). there exists a set S z of (N — K) — (L — 1) non-defective 
items such that {]^ X 2 (:,i) > lj_ r X 2 (:. j), Vj G 5 2 }. Taking the union bound over all possible S z , we 
get 


£0 (i) C o U {1 ^X z (:,i) > 1 ZfX 2 (:,j),Vj € S z j , (30) 

and (l27l) now follows since using Proposition [2] we have, £ C Ui^s d £o{t). ■ 

Note that, for a given i, the quantity X 2 (:, i) is the same as the quantity Tit) with ip c ^ = 0 as 
defined in the proof of Theorem [Q and (1271) is the same as (flOl ). Thus, following the same analysis as in 
Section IV-A1 it follows that, if M satisfies © with fio = 0, the LP relaxation based algorithm RoLpAl 
succeeds in recovering L non-defective items with probability exceeding 1 — exp cq log 
exp(—c 0 log K). 



C. Proof Sketch for Theorem\3\ 

We use the same notation as in Theorem [2] and analyze an equivalent program that is obtained by 
replacing (1 — z) by 2. We note that LP2 differs from LP0 only in terms of the objective function, and 
the constraint set remains the same. And thus, the complimentary slackness and the primal dual feasibility 
conditions are the same as given in (1251) and (l26l) . respectively. The zero gradient condition for LP2 is 
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given by: 

1 m z ^-z ~ ^ZplM p Xp — Ai + A 2 ~ ul/v = Oat- (31) 

Let the error event associated with CoLpAl be denoted by S. Let i € S,i, and define 8 , = { / € §l}. 
Note that 8 C U,; e s ri £,;. Further, it follows that 8 i C Ai U Bi, where Ai = {A 2 (z) = 0} and £7 = 
{Si n {A 2 (?') > 0}}. Let us first analyze B,. Using similar arguments as in Propositions |2] and |4j it can 
be shown that, 

Bi C {u = 0} C U {l lx z (:,j) - ^pIm p X p (:, j) < 0, Vj € S z \ , (32) 

where S z C L\ r ] \,S',/ is any set of non-defective items such that | S z = (N — K) — (L — 1) and S z denotes 
all such sets possible. Further, using similar arguments as in the proof of Theorem [2j it can be shown 


that 

•Ai C U |Im 2 X 2 (:,z) — 'i/ , iplM p X p (:, i) > 1J^X 2 (:,j) — '0/ p l^ p X p (:, j), Vj € , (33) 

where S z and S z are as defined above. 

The subsequent analysis follows by using the Bernstein inequality to upper bound the probability of 
events Ai and Bj in a manner similar to the previous proofs; we omit the details for the sake of brevity. 
Define ip' 0 = min 2 (UTy) ■ Note that ’ with Vv = e ([Im* X 2 (:, j) - VvIm p X p (:, j)]) > 

MpT/2 > 0 for any j £ S z . This helps in upper bounding the probability of B, using Bernstein’s 


inequality. In essence, it can be shown that there exists an absolute constant C 4 & > 0, such that 


( U Bi) < exp - 

i£.Sd 


f MpTN 0 

l c Ab 


- log 


K 


N -K 
L - 1 


(34) 


Similarly, following the same steps as in the proof of Theorem [Q it can be shown that, for the chosen 

value of ipi p , there exists an absolute constant 64 ,, such that 
f Mnr(-\ 


, , 4 4 / , M pT{l - 7o) 2 A r o , , 

P( U Ai) < exp I-—-h log 


ies d 


C Aa 


N -K 
L- 1 


MpT{ I- 70) 2 ^ 

+ exp-—- 1 - log K 


Cac 


(35) 

The final result now follows by substituting p = since, by choosing M as in © with ipo = 

0, C a i = riiax{C 4 a . C 45 } and C' (l2 = C\ c , the total error in (l34l ). (1351 ) can be upper bounded as 
2 exp cq log + exp (—cq log K). This concludes the proof. 


VI. Simulations 

In this section, we investigate the empirical performance of the algorithms proposed in this work 
for non-defective subset recovery. In contrast to the previous section, where theoretical guarantees on 
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the number of tests were derived based on the analysis of the upper bounds on probability of error of 
these algorithms, here we find the exact number of tests required to achieve a given performance level, 
thus highlighting the practical ability of the proposed algorithms to recover a non-defective subset. This, 
apart from validating the general theoretical trends, also facilitates a direct comparison of the presented 
algorithms. 

Our setup is as follows. For a given set of operating parameters, i.e., N, K, u, q and M, we choose 
a defective set Sd C [N] randomly such that S,j \ = K and generate the test output vector y according 
to ©. We then recover a subset of L non-defective items using the different recovery algorithms, i.e., 
RoAl, CoAl, RoLpAl, RoLpAl++ and CoLpAl, and compare it with the defective set. The empirical 
probability of error is set equal to the fraction of the trials for which the recovery was not successful, 
i.e., the output non-defective subset contained at least one defective item. This experiment is repeated 
for different values of M and L. For each trial, the test matrix X is generated with random Bernoulli 
i.i.d. entries, i.e., Xy ~ £>(p), where p = l/K. Also, for CoAl and CoLpAl, we set i/jd, = 1 2^ r and 
vpip = min ^ iZ^r ; 2 (i-r) ) ’ res P ect ively- Unless otherwise stated, we set N = 256, K = 16 , u = 0.05, 
q = 0.1 and we vary L and M. 

Figure [2] shows the variation of the empirical probability of error with the number of tests, for 
L = 64 and L = 128. These curves demonstrate the theoretically expected exponential behavior of 
the average error rates, the similarity of the error rate performance of algorithms RoAl and RoLpAl, 
and the performance improvement offered by RoLpAl++ at higher values of L. We also note that, as 
expected, the algorithms that use tests with both positive and negative outcomes perform better than the 
algorithms that use only tests with negative outcomes. 

Figure [3] presents the number of tests M required to achieve a target error rate of 10% as a function 
of the size of the non-defective subset, L. We note that for small values of L, the algorithms perform 
similarly, but, in general, CoAl and CoLpAl are the best performing algorithms across all values of L. 
We also note that, as argued in Section IT11-CI RoLpAl++ performs similar to RoLpAl for small values 
of L and for large values of L the performance of the former is the same as that of CoLpAl. Also, as 
mentioned in Section |WJ we note the linear increase in M with L, especially for small values of L. We 
also compare the algorithms proposed in this work with an algorithm that identifies the non-defective 
items by first identifying the defective items, i.e., we compare the “direct” and “indirect” approach |[2TI of 
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identifying a non-defective subset. We first employ a defective set recovery algorithm for identifying the 
defective set and then choose L items uniformly at random from the complement set. This algorithm is 
referred to as “InDirAl” in Figure [3] In particular, we have used “No-LiPo-” algorithm |[T71 for defective 
set identification. It can be easily seen that the “direct” approach significantly outperforms the “indirect” 
approach. We also compare against a non-adaptive scheme that tests items one-by-one. The item to be 
tested in each test is chosen uniformly at random from the population. We choose the top L items 
tested in all the tests with negative outcomes as the non-defective subset. This algorithm is referred to at 
“NAlbyl” (Non-Adaptive 1-by-l) in Figure [3] It is easy to see that the group testing based algorithms 
significantly outperform the NAlbyl strategy. 

Figure [4] compares the number of tests required to achieve a target error rate of 10% for CoAl with 
the information theoretic lower bound for two different values of iT [] It can be seen that the empirical 
performance of CoAl is within O (log K) of the lower bound. The performance of the other algorithms 
is found to obey a similar behavior. 

As discussed in Section |IVj the parameter settings require the knowledge of K. Here, we investigate 
the sensitivity of the algorithms on the test matrix designed assuming a nominal value of K to mismatches 
in its value. Let the true number of defective items be K t . Let M(K,K t ) denote the number of tests 
required to achieve a given error rate when the test is designed with K = K. Let A m{K, K t ) = xi(i<i • 
Thus, Am(K, K t ) represents the penalty paid compared to the case when the test is designed knowing 
the number of defective items. Table m shows the empirically computed A m for different values of 
uncertainty factor Ak — for the different algorithms. We see that the algorithms exhibit robustness 
to the uncertainty in the knowledge of K. For example, even when K = 2 K t , i.e., A k = 2, we only pay 
a penalty of approximately 17% for most of the algorithms. Also, as suggested by the analysis of the 
upper bounds in Section |TV] the algorithms exhibit asymmetric behavior in terms of robustness and are 
more robust for A k > 1 compared to when A k < 1. 

Figure [5] shows the performance of different algorithms with the variations in the system noise 
parameters. Again, in agreement with the analysis of the probability of error, the algorithms perform 

4 We refer the reader to Theorem 3 and Section IV in ED for a detailed discussion on the information theoretic lower bound. 
Also, see equations (7) and (9) in [32) for the derivation of the mutual information term that is required for computing the lower 
bound for the group testing signal model. 
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Fig. 2. Average probability of error (APER) vs. number of tests M. The APER decays exponentially with M. 

TABLE II 

Robustness of the non-defective subset identification algorithms to uncertainty in the knowledge 

of K. The numbers in the table are A m {K, K t ). 


K t = 16, N = 256, L = 128, q = 0.1, u = 0.05 


A K = 0.75 

A k = 1.5 

A K = 2.0 

RoAl 

1.13 

1.06 

1.20 

CoAl 

1.13 

1.04 

1.17 

RoLpAl 

1.09 

1.04 

1.17 

RoLpAl++ 

1.04 

1.00 

1.17 

CoLpAl 

1.11 

1.03 

1.19 


similarly with respect to variations in both the additive and dilution noise. 

VII. Conclusions 

In this work, we have proposed analytically tractable and computationally efficient algorithms for 
identifying a non-defective subset of a given size in a noisy non-adaptive group testing setup. We have 
derived upper bounds on the number of tests for guaranteed correct subset identification and we have 
shown that the upper bounds and information theoretic lower bounds are order-wise tight up to a poly- 


March 1, 2016 


DRAFT 




























































26 


450 


400 


350 


c n 
CD 


300 


-250 

I 20' 


Off 


I 150 
100 


Avg. error rate = 


N = 256, K-16 
u = 0.05, q = 0.1 


10 % 


A' 




xy;' 

0 ' A 


S' 




jcy -0 


, O- -O' 


- 0-0 


-A - NAIByl 
-0 - InDirAI 
-©— RoAl 
-Hit — RoLpAI 
A RoLpAI+f 
—B— CoAl 



60 80 100 120 140 160 

Size of healthy subset, L 


Fig. 3. Number of tests vs. size of non-defective subset. Algorithm CoLpAl performs the best among the ones considered. 
The direct approach for finding non-defective items significantly outperforms both the indirect approach (“InDirAI”), where 
defective items are identified first and the non-defective items are subsequently chosen from the complement set ED, as well 
as the item-by-item testing approach (“NAIByl”). 

log factor. We have shown that the algorithms are robust to the uncertainty in the knowledge of system 
parameters. Also, it was found that the algorithms that use both positive and negative outcomes, namely 
CoAl and the LP relaxation based CoLpAl, gave the best performance for a wide range of values of L, 
the size of non-defective subset to be identified. In this work, we have considered the randomized pooling 
strategy. It will be interesting to study deterministic constructions for the puipose of non-defective subset 
identification; this could be considered in a future extension of this work. Another interesting question 
to investigate is to extend the non-defective subset identification problem to scenarios with structured 
pooling strategies, e.g., for graph constrained group testing where the pools are constrained by the nodes 
that lie on a path of a given graph. 
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Fig. 4. Comparison of CoAl with the scaled information theoretic lower bounds. Here, the lower bounds have been scaled by 
a multiplicative factor of log(JtT). The close agreement of the scaled lower bound with the performance of the algorithm shows 
that CoAl is within a log(Tf) factor of the lower bounds. 

Appendix 

A. Proof of Lemma [7] 

We note that a test outcome is 0 only if none of the K defective items participate in the test and the 
output is not corrupted by the additive noise, (a) now follows by noting that the probability that an item 
does not participate in the group test is given by (1 — p) +pu. (b) follows from dTJ. For (c) we note that, 
given that Xu = 1 for any i G 5’j, the outcome is 0 only if the ?' th item does not participate in the test 
(despite Xu = 1) and none of the remaining K — 1 defective items participate in the test (either the entry 
of the test matrix is zero or the item gets diluted out by noise) and the test outcome is not corrupted by 
additive noise. That is, P(Y) = 0\Xu = 1) = u( 1 — (1 — p)u) K ~ 1 ( 1 — q) = 7 oT. The other part follows 
similarly, (d) follows by noting that for any i G Sd and j f Sd, P(V/| A),;, Xjf) = F(Yi\Xu). By Bayes rule 
and part (b) in this lemma, we get: P(Xn, Xij\Yi) = T(>i J^) Xi3) P(^)P(X ;j ) = F(X H \Yi)F(Xi j \Yi). 
Hence the proof. 
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(a) (b) 

Fig. 5. Variation of the average probability with (a) additive noise ( q ) and (b) dilution noise (u). 

B. Proof of (1771) and (1721 ) 

For i € Sd and j ^ Sd, let Xj(l) = Xji, (7) = Xu and y(Z) = Y/. For any k € [2V], we note that 
T(k) can be written as a sum of M independent random variables Y^iLi ^ki, where Zki takes value 1 
with probability P (X^i = l,Yj = 0), —il> c b with probability P (X^i = 1, Yj = 1), and takes the value 0 
otherwise. From Lemma |T] we know that P(Y[ = 0|2Q; = 1) = 7 or and P(Y; = 0 |Xji = 1) = T and 
thus (fill) follows. Further, (fT2l) follows by noting that 

Var (ZjO < E(Zj[) = p {T + ^(1 - T)) 

Var(Zjj) < E(Z?j) = p ( 7 0 r + ^(1 - 7 0 r)) . 


C. Proof of Proposition 12] 

We first prove that, for all f € ,5V., X 2 (i) = 0. The proof is based on contradiction. Suppose 3 j G ,5V. 
such that A 2 (j) > 0. This implies, from the complimentary slackness conditions (l25l) . z{j) = 1 and thus, 
A[ (j) = 0. Since j th item is amongst the smallest L entries, this implies that 1 f^z > (N — L). Flence, 
v = 0. From the zero gradient condition in ( f24l ). it follows that j_yf X 2 (:,j) = —X 2 {j) < 0, which 
is not possible, as all entries in X are nonnegative. It then follows that V i € Sl A 2 (V) = 0. Thus, if 
A 2 (i) > 0 V i € Sd, then these items cannot belong to the first L entries in the primal solution z, i.e., 

s d ns L = { 0 }. 
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D. Proof of Proposition 0 

Suppose v < 6q. Then 3 i such that A 1 (i) = 0 and v < Tj^ X,(:,i). Thus, from (l24l) . A 2 (z) = v — 
lj/ X 2 (:,i) < 0, which violates the dual feasibility conditions (1261 . Thus, v > Similarly, let z/ > #i. 
Then 3 i such that A 1 (i) = 1 and v > X 2 (:, i). Thus, from (l24l) . A 2 (i) = A 1 (i) + z/ — T^ X 2 (:, i ) > 1, 

which is a contradiction since A 1 (i) > 0 implies A 2 (i) = 0. Thus, u >9 1 is not possible. 


E. Affine characterization of the function —pzyp 

Lemma 2. Let H if-) represent the binary entropy function. Then, for 0 < a < a:/, < 1, there exist 


positive absolute constants Co,ci > 0, with c\ depending on a/,, such that 

H b {a) 


1 — a 


< c 0 a + ci. 


To exablish (1361 ). we note that 

H b (a) a 


OO 


a \ - (1 — off t a 1 

i= 1 

2 \ „Yi „V3 / 00 


1 — a 


< a 1 + 


(1 — a) (1 —a) 2 \ a(l — a) 


a . , , . ,, . u; \ -> i — «i \ -> a 

--log a -log 1 -a = -- ^ +Y,— 

1 — a 1 — a i 


i =1 


+ 


ct 2 a 3 a 4 

+ a + T + T + T 


+ 


Ed-")'- 1 


\i=l 


E “‘ _1 


17 1 

< —a -i— 
“6 4 


i =1 
4 


(1 - ct) 3 + 


a 


1 — a 


S CqCT + Cl, 


(36) 


where co = 17/6 and c\ is obtained by appropriately bounding the second term when 0 < a < Q/,. In 
particular, for ah < 0.5, c\ = 0.25 will satisfy 


F. Discussion on the theoretical guarantees for RoLpAl++ 

The discussion for RoLpAl++ proceeds on similar lines as RoLpAl. We use the same notation as in 
Section IV-Bl and, as before, we analyze an equivalent LP obtained by eliminating the equality constraints 
and substituting (1 — z) by z. The corresponding KKT conditions for a pair of primal and dual optimal 
points are as follows: 

~~ b ?Xp — Ai + A 2 — v\ N = Ojy (37) 

p o (X-pZ — (1 — eo )Xm p ) = Q-Mf Ai ° z = Ojv; A 2 o (z — 1 N ) = 0^; v(lJjZ — (N — L)) = 0; (38) 
0 tv z. Aat) i JjZ. A (N — L ); q Qm p i Ai A Ojvi A 2 0 N \ v > 0; (39) 
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In the above, p, G M Mp is the dual variable associated with constraint ([8]) of LP1. Let (z, p, A l5 A 2 , v) be 
a primal, dual optimal point satisfying the above equations. We first prove the following: 

Proposition 5. If A 2 (i) > 0, then p T X p (:,i) = 0. 

Proof: For any l G [M p ], if X p (Z,i) = 0 then p(Z)X p (Z,i) = 0. If X p (Z,i) = 1, then for the Z th 
test X p (Z, :)z > 1 > (1 — eo), since A 2 (i) > 0 implies z(z) = 1. This implies p(Z) = 0, and thus 
p(Z)X p (Z,i) = 0. Hence the proposition follows. ■ 

Using the above, it is easy to see that Proposition [2] holds in this case also. Furthermore, using the 
same arguments as in Section IV-BI it can be shown that the error event associated with RoLpAl++, £, 
satisfies £ C U iG5tl U Sze<Sz {r jGS ,£ 0 (i, j)}, where 

£ 0 (i,j) = {iSf,X z (:,z) - p T X p (:,z) - Ai(i) > - p T X p (:, j)}, (40) 

with the notation for S z and S z as defined at the beginning of Section |V] In the following discussion, since 
i is fixed, for notational simplicity we will use £o(j) — £o(i,j). Note that, for RoLpAl, the error event 
is upper bounded by a similar expression as the above but with £q( j ) replaced by £\ (j) = {l^ f X z (: 
. I) > 1^ X z (:,j)}. In order to analytically compare the performances of RoLpAl and RoLpAl++, we 
try to relate the events £q( j) and £i(j). Note that if £o(j) C £\{j), then P(£o(j)) < P(£i(j)), and hence, 
RoLpAl++ would outperform RoLpAl. Now, when // = 0 Mz , £q(J ) C £i(j), Vj G S z . For // f 0, we 
divide the items in S z into two disjoint groups: 

(a) A 2 (j) > 0: Since /i T X p (:, j) = 0, p T X p (:,i) > 0 and Ai(i) > 0, it follows that £q C £\. 

(b) A 2 (j) = 0: We note that £o(j) Q £i(j)l)£[(j) where £[(j) = {p T [X p (:, j) - X p (:, *)] >« + Ai(i)}, 
where k + Ai(z) > 0. 

A technical problem, which does not allows us to state the categorical performance result, arises now. 
It is difficult to obtain the estimates for the dual variables p and hence of ¥(£[ (j)). Therefore, we offer 
two intuitive arguments that provide insight into the relative performance of RoLpAl++ and RoLpAl. 
The first argument is that the majority of the items in S z will have A 2 (j) > 0 and thus, for a majority 
items in S z , it follows that IP(£o(j)) < ¥{£\ (j)). This is because the set {j : A 2 (j) = 0} is given by, 

[j ■■ 0L*z(:,j) -/r T X p (:, j)) = max (l^X 2 (:, l) - /r T X p (:, l)) \ , (41) 

and, as the number of tests increase and the number of nonzero components of /1 increase, the probability 
that above equality holds becomes smaller and smaller. Furthermore, for a small number of items jeS z 
with A 2 (j) = 0, it is reasonable to expect that ¥{£[ (j)) will be small. This is because the probability that 
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a defective item is tested in a pool with positive outcome is higher that the probability that a non-defective 
item is tested in a pool with positive outcome. Thus, the expected value of p 1 [X(:,j) — X(:,z)] will 
be negative for a non-negative p and, thus using concentration of measure arguments, we can expect 
P(£{(j)) to be small. Thus, we expect that RoLpAl++ to perform similar (or even better) than RoLpAl. 

G. Chernoff Bounds 

Theorem 4. (Bernstein Inequality l \30\l ) Let Xj. X 2 ,... ,X n be independent real valued random vari¬ 
ables, and assume that \Xi\ < c with probability one. Let X = ^" =1 l 1 = E(X) and a = Var(X). 
Then, for any 6 > 0, the following hold: 

P(X> M + i)<exp(-^p_) <42) 

P(X < p - ,5) < exp (-_T-p_) ,43) 
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